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Sorting database tables before compressing them improves the compression rate. Can we do better than the 
lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, 
the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a 
significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a 
variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good 
option for very large tables. However, for some compression schemes, it is more important to generate long 
runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we 
can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: 
these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row 
reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases. 

Categories and Subject Descriptors: E.4 [Coding and Information Theory]: Data compaction and com- 
pression; H.4.0 [Information Systems Applications]: General 

General Terms: Algorithms, Experimentation, Theory 

Additional Key Words and Phrases: Compression, Data Warehousing, Gray codes, Hamming Distance, 
Traveling Salesman Problem 

1. INTRODUCTION 

Database compression reduces storage while improving the performance of some que ries. It 
is commonly recom mended to sort ta bles to improve the compressibility of the tables |Poess 
and Potapov 2003 or of the indexes Lemire et al. 2010 . While it is not always possible or 
useful to sort ana compress tables, sorting is a critical compo nent of some column-oriented 



architectures |Abadi et al. 2008 



At the simplest level, we mode 



Holloway and DeWitt 2008 
compressibility by counting runs of identical values within 



columns. Thus, we want to reorder rows to minimize the total number of runs, in all columns 
(§[3]). The lexicographical order is the most common row-reordering heuristic for this prob- 
lem. 

Can we beat the lexicographic order? Engineers might be willing to spend extra time 
reordering rows — even for modest gains (10% or 20%) — if the computational cost is accept- 
able. Indeed, popular compression utilities such as bzip2 are often several times slower than 
faster alternatives (e.g., gzip) for similar gains. 

Moreover, minimizing the number of runs is of theoretical interest. Indeed, it reduces to 
the Traveling Salesman Problem (TSP ) unde r the Hamming distance — an NP-hard prob- 



lem [Trevisan 1997} |Ernvall et al. 1985] 



3.1 ). Yet there have been few attempts to design 
and study TSP heuristics with the Hamming distance and ev en f ewer on large data sets. 
For the generic TSP, there are several well known heuristics (§ 3.2), as well as strategies to 
scale them up (§ 3.31. Inspired by these heuristics, we introduce the novel Multiple Lists 



heuristic (§ 3.3.1 1 which is designed with the Hamming distance and scalability in mind. 



While counting runs is convenient, it is an incomplete model. Indeed, several compression 



This work is supported by Natural Sciences and Engineering Research Council of Canada grants 261437 
and 155967 and a Quebec/NB Cooperation grant. 

Author's addresses: D. Lemire, LICEF Research Center, TELUQ; O. Kaser and E. Gutarra, Dept. of CSAS, 
University of New Brunswick, Saint John. 



:2 



D. Lemire et al. 



algorithms for databases may be more effective when there are many "long runs" (§ Eh. 
Thus, instead of minimizing the number of runs of column values, we may seek to maxi- 
mize the number of long runs. We can then test the result with popular compression algo- 



rithms (§ 6.1 1. For this new problem, we propose two heuristics: Frequent-Component 
(§ 4.2), and Vortex (§ |4.3[ ). Vortex is novel. 

All our contributed heuristics have 0(n log n) complexity when the number of columns is 
a constant. However, Multiple Lists uses a linear number of random accesses, making it 



prohibitive for very large tables: in such cases, we use table partitioning (§ |3.3.2 ) 



We can assess these TSP heuristics experimentally under the Hamming distance (§ pj). 
Using synthetic data sets (uniform and Zipfian), we find that Vortex is a competitive 
heuristic on Zipfian data. It is one of the best heuristics for generating long runs. Meanwhile, 
Multiple Lists offers a good compromise between speed and run minimization: it can even 
surpass much more expensive alternatives. Unfortunately, it is poor at generating long runs. 

Based on these good results, we apply Vortex and Multiple Lists to realistic tables, 
using various table-encoding techniques. We show that on several data sets both Multiple 
Lists and Vortex can improve compression when compared to the lexicographical order — 
especially if the column histograms have high statistical dispersion (§[6]). 

2. RELATED WORK 

Many forms of compression in databases are suscep tible to row reordering. F or example 



to increase the compression factor, Oracle engineers Poess and Potapov 2003 recommend 
sorting the data before loading it. Moreover, they recommend taking into account the cardi- 
nality of the columns — that is, the number of distinct column values. Indeed, they indicate 
that sorting on low-cardinality columns is more likely to increase the compression factor. 
Poess and Potapov do not quantify the effect of row reordering. However, they report that 
compression gains on synthetic data are small (a factor of 1.4 on TPC-H) but can be much 
larger on real data (a factor of 3.1). The effect on performance varies from slightly longer 
running times to a speedup of 38% on some queries. Loading times are doubled. 

Column-oriented databases and indexes are particularly suitable for compression. 
Column-oriented dat abases such as C-Store use the conventiona l (lexicographical) sort to 



improve compression |Stonebraker et al. 2005 Abadi et al. 2006| . Specifically, a given table 
is decomposed into several overlapping projections (e.g., on columns 1,2,3 then on column 
2,3,4) which are sorted and compressed. By choosing projections matching the query work- 
load, it is possible to surpass a conventional DBMS by orders of magnitude. To validate 
their model, Stonebraker et al. used TPC-H with a scale factor of 10: this generated 60 mil- 
lion rows in the main table. They kept only attributes of type INTEGER and CHAR(l). 
On this data, they report a total space usage of 2 GB compared to 2.7 GB for an alternative 
column store. They have a 30% storage advantage, and better performance, partly because 
t hey sort their projections before compressing them. 

Lemire and Kaser 201 1| prove that sorting the projections on the low-cardinality column 



first often maximizes compression. They stress that picking the right column order is impor- 
tant as the compressibility could vary substantially (e.g., by a factor of 2 or 3). They consider 
various alternatives to the lexicographical order such as modular and reflected Gray-code 
orders or Hilbert orders, and find them ineffective. In contrast, we propose new heuristics 
(Vortex and Multiple Lists) that can surpass the lexi cograp hical order. Indeed, when 



using a compression technique such as Prefix coding (see § 6.1.1 1, Lemire and Kaser obtain 
compression gains of more than 20% due to sorting: usi ng the same compression technique, 



on the same data set, we report further gains of 21%. Pourabbas et al. 2012| extend the 



strategy by showing that columns with the same cardinality should be ordered from high 
skewness to low skewness. 

The compression of bitmap indexes also greatly benefits from table sorting. In some exper- 



iments, the sizes of the bitmap indexes are reduced by nearly an order of magnitude Lemire 
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et al. 2010 . Of course, everything else being equal, smaller indexes tend to be faster. Mean- 
while, alternatives to the lexicographical order such as Freque nt-Component, refl ected 
Gray-code or Hilbert orders are unhelpful on bitmap inde xes Lemire et al. 2010] . (We 
review an improved version of Frequent-Component in § |4.2' 



Still in the context of bitmap indexes, |Malik and Kender |2007| get good compression 
results using a variation on the Nearest Neighbor TSP heuristic. Unfortunately, its quadratic 
time complexity makes t he processing of large data sets difficult. To improve scalability, 
Malik ~and Kender [2007] also propose a faster heuristic called aHDO which we review in 
Sj |3.2| In comparison, our novel Multiple Lists heuristic is also an attempt to get a more 
scalable Nearest Neighbor heuristic. Malik and Kender used small data sets having between 
204 rows and 34389 rows. All their compressed bitmap indexes were under 10 MB. On their 
largest data set, the compression gain from aHDO was 14% when compared to the original 
order. Sorting improved compression by 7% , whereas their Nearest Neighbor TSP heuristic 
had the best gain at 17%. Pinar et al. 2005] also present good compression results on bitmap 
indexes after reordering: on their largest data set (11MB), they report using a Gray-code 
approach to get a compression ratio of 1.64 compared to the original order. Unfortunately, 
they do not compare with the lexicographical order. 

Sometim es reorderin g all of the data before compression is not an option. For example, 
Fusco et al. |2010[[2012| describe a system where bitmap indexes must be compressed on-the- 
fly to index network traffic. They report that their system can accommodate the insertion 
of more than a million records per second. To improve compressi bility without sacr ificing 
performance, they cluster the rows using locality sensitive hashing Gionis et al. 1999 . They 
report a compression factor of 2.7 due to this reordering (from 845 MB to 314 MB). 



3. MINIMIZING THE NUMBER OF RUNS 

One of the primary benefits of column stores is the compression due t o run-length encoding 
(RLE) |Abadi et al. 2008] |Bruno 2009"] |Holloway and DeWitt 2008] Moreover, the most 



popular bitmap-index compression techniques are variations on RLE Wu et al. 2006 

RLE is a compression strategy where runs of identical values are coded using the repeated 
value and the length of the run. For example, the sequence aaaaabcc becomes 5 x a, 1 x 
b,2 x c. Counters may be s t ored using a variable numb er of bits, e.g., usin g variable-byte 



2002 



codin g |Scholer et al. 20 02] |Bhattacha rj ee et al. 2009] , Elias delta coding Scholer et al i 

Or we may store counters using a fixed number of 



or Golomb coding Golomb 1966 



bits for faster decoding. 

RLE not only reduces the storage requirement: it also reduces the processing time. For 
example, we can compute the component-wise sum — or indeed any 0(1) operation — of 
two RLE-compressed array in time proportional to the total number of runs. In fact, we 
sometimes sacrifice compression in favor of speed: 

— to help random access, we can add the row identifier to the run length and repeated 



ip 

value [Abadi et al. 2006 so that 5xa, lxb,2xc becomes 5 x a at 1, 1 x b at 6, 2 x c at 7; 



to s implify comp utations, we can forbid runs from different columns to partially over 
lap [Bruno 2009] : unless two runs are disjoint as sets of row identifiers, then one must be 
a subset of the other; 

-to avoid the overhead of decoding too many counters, we may store single values or shor t 
runs verbatim — without any attempt at compression Antoshenkov 1995 Wu et al. 2006 



Thus, instead of trying to model each form of RLE compression accurately, we only count 
the total number of runs (henceforth RunCount). 



Unfortunately, minimizing RunCount by row reordering is NP-hard Lemire and Kaser 
2011 Olken and Rotem _1986] . Therefore, we resort to heuristics. We examine many possible 
alternatives (see Table [j]). 
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Table I: Summary of heuristics considered and overall results. Not all methods were tested 
on realistic data; those not tested were either too inefficient for large data, or were clearly 
unpromising after testing on Zipfian data. 
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An effective heuristic for the RunCount minimization problem is to sort the rows in 
lexicographic order. In the lexicographic order, the first component where two tuples differ 
(cij 7^ bj but a,i = hi for i < j) determines which tuple is smaller. 

There are alternatives to the lexicographical order. A Gray code is an ordered list of 
tuples such that the Hamming distance between successive tuples is one Q The Hamming 
distance is the number of different components between two same- length tuples, e.g., 

d( (a,b,y), (a,d,x) ) = 2. 

The Hamming distance is a metric: i.e., d(x, x) = 0, d(x, y) = d(y, x) and d(x, y) + d(y, z) > 
d(x,z). A Gray code over all possible tuples generates an order (henceforth a Gray-code 
order): x < y whenever x appears befo re y in the Gray code. For ex ample, we can use the 
mixed-radix reflected Gray-code order Richards 1986[ Knuth 2011] (henceforth Reflected 
GC). Consider a two-column table with column cardinalities N\ and A^. We label the 
column values from 1 to N\ and 1 to N 2 . Starting with the tuple (f,f), we generate all 
tuples in Reflected GC order by the following algorithm: 

— If the first component is odd then if the second component is less than A^, increment it, 
otherwise increment the first component. 

— If the first component is even then if the second component is greater than I, decrement 
it, otherwise increment the first component. 

E.g., the following list is in Reflected GC order: 

(I, I), (I, 2), . . . , (I, N 2 ), (2, N 2 ), (2, N 2 - 1), . . . , (2, 1), (3, 1 ), . . . 

The generalization to more than two columns is straightforward. Unfortunately, the benefits 



of Reflected GC compared to the lexicogra phic order are small Malik and Render 2007 



Lemire and Kaser 2011 Lemire et al. 2010 

We can bound the optimality of lexicographic orders using only the number of rows and 
the cardinality of each column. Indeed, for the problem of minimizing R unCount by row 
reordering, lexicographic sorting is /x-optimal Lemire and Kaser 2011 for a table with 



1 For a more restrictive definition, we can replace the Hamming distance by the Lee metric |Anantha et al.| 
120071. 
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n distinct rows and column cardinalities Ni for i = 1, . . . , c with 

Ei=i min KIlLi^) 

/z = — . 

n + c — 1 

To illustrate this formula, consider a table with 1 million distinct rows and four columns hav- 
ing cardinalities 10, 100, 1000, 10000. Then, we have /i « 2 which means that lexicographic 
sorting is 2-optimal. To apply this formula in practice, the main difficulty might be to de- 



termine the number of distinct rows, but there are good approximation algorithms Aouiche 



and Lemire 2007 Kane et al. 2010 . We can improve the bound [i slightly: 



Lemma 3.1. For the RunCount minimization problem, sorting the table lexicographi- 
cally is uj- optimal for 

uj — 

n + c — 1 

where c is the number of columns, n is the number of distinct rows, and nx,j is the number 
of distinct rows when considering only the first j columns (e.g., n — n\ c ). 

PROOF. Irrespective of the order of the rows, there are at least n + c—1 runs. Yet, undcr 
the lexicographic order, there are no more than n\i runs in the i th column. The result 
follows. □ 

The bound U) is tight. Indeed, consider a table with iVi, N 2 , . . . , N c distinct values in 
columns 1, 2, . . . , c and such that it has n — N1N2 . ■ ■ N c distinct rows. The lexicographic 



order will generate N\ +N1N2 + ■ ■ -+NiN 2 ■ ■ ■ N c runs. In the notation of Lemma 3.1 there 
are Tin runs. However, we can also order the rows so that there are only n + c — 1 runs 

by using the Reflected GC order. 

We have that uj is bounded by the number of columns. That is, we have that 1 < u < c. 
Indeed, we have that n l c = n and nn > 1 so that X^ =1 nn > n + c—1 and therefore 

w = ^rt+c-i ' — 1- W e also have that m,i < n so that J2i=i n M — cn — c ( n + c — 1) and 

hence to = ^gjpr^ c - I n practice, the bound ui is often larger when c is larger (see 



3.1. Run minimization and TSP 

There is much literature about the TSP, including approximation algorithms and many 
heuristics, but our run-minimization problem is not quite the TSP: it more resembles a 



minimum- weight Hamiltonian path problem because we do not complete the cycle Cho 



and Hong 2000] , In order to use known TSP heuristics, we need a reduction from our 
problem to TSP. In particular, we reduce the run-minimization problem to TSP over the 
Hamming distance d. Given the rows r\, r 2 , ■ ■ ■ , r n , RunCount for c columns is given by 
the sum of the Hamming distance between the successive rows, 

c + d{ri,r i+1 ). 

i=l 

Our goal is to minimize *£h=\ ^( r ii r i+i)- Introduce an extra row r* with the property that 
d(r+,ri) = c for any i. We can achieve the desired result under the Hamming distance by 
filling in the row r* with values that do not appear in the other rows. We solve the TSP 
over this extended set (ri, . . . , r n , r*) by finding a reordering of the elements (r^, . . . , r' n , r*) 
minimizing the sum of the Hamming distances between successive rows: 

n— 1 n—1 

d(r' n , r*) + d(r*, r[) + ^ d(r<, r' i+1 ) = 2c + ^ d(r{, r' i+1 ). 

i=l i=l 
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Any reordering minimizing 2c + Y^i=i ^( r 'iJ r 'i+i) a ^ so minimizes ^27— x dir'^ r' i+1 ). Thus, we 
have reduced the minimization of RunCount by row reordering to TSP. Heuristics for 
TSP can now be employed for our problem — after finding a tour (r' q , . . . , r' n , r+), we order 
the table rows as r[, r' 2 , . . . , r' n . 

Unlike the general TSP, we know of linear-ti me c-optimal heuristi cs when using the Ham- 
ming distance. An ordering is discriminating Cai and Paige 1995 if duplicates are listed 
consecutively. By constructing a hash table, we can generate a discriminating order in ex- 
pected linear time. It is sufficient for c-optimality. 

Lemma 3.2. Any discriminating row ordering is c-optimal for the RunCount mini- 
mization problem. 

Proof. If n is the number of distinct rows, then a discriminating row ordering has at 
most nc runs. Yet any ordering generates at least n runs. This proves the result. □ 

Moreover — by the triangle inequality — there is a discriminating row order minimizing 
the number of runs. In fact, given any row ordering we can construct a discriminating 
row ordering with a lesser or equal cost X)2=i ^( r ii r i+i) because of the triangle inequality. 
Formally, suppose that we have ci non-discriminating order ri, T2^ • • • j fn- 

We can find two 

identical tuples (r^ = rj) separated by at least one different tuple {rk+i 7^ rj). Suppose j < 



n. If we move rj between and rk+i, the cost Y^i=i 



i,n+i 



will change by d(rj 



r 3 + U 



the 



(d(rj_i, rj) + d(rj,rj+i)): a quantity at most zero by the triangle inequality. If j 
cost will change by — d(rj^%, rj), another non-positive quantity. We can repeat such moves 
until the new order is discriminating, which proves the result . 

3.2. TSP heuristics 

We want to solve TSP instances with the Hamming distance. For such metrics, one 
of the earliest and still unbeaten TSP heuristics is the 1.5-optimal Ch ristofides algo- 
rithm Christofides 1976 Berman and Karpinski 2006 _ Gharan et al. 2011 . Unfortunately, 
it runs in (J(n^' b (log n) 5 ) time Gabow and_Tarjan~"l991 and even a quadratic running 



time would be prohibitive for our application 



1997 



Thus, we consider faster alternatives Johnson and McGeoch 2004 Johnson and McGcoch 



Some heuristics are based on space-filling curves Platzman and Bartholdi 1989 . In- 



tuitively, we want to sort the tuples in the order in which they would appear on a curve 
visiting every possible tuple. Ideally, the curve would be such that nearby points on the 
curve are also nearby under the Ha mming distance. In this sense, lexicographic orders — as 
well as the Vortex order (see § 4.3 ) — belong to this class of heuristics even though they are 
not generally considered space-filling curves. Most of t hese heuristics run in time Q(n log n). 



There are various tour-construction heuristics Johnson and McGeoch 2004 . These 
heuristics work by inserting, or appending, one tuple at a time in the solution. In this sense, 
they are greedy heuristics. They all begin with a randomly ch osen starting tuple. The 
simplest is Nearest Neighbor Bellmore and Nemhauser 1968 : we append an available 
tuple, choos ing one of those nearest to the last tuple added. It runs in time 0(n 2 ) (sec 
also Lemma 3.3). A variation is to also allo w tuples to be inserted at the beginnin g of the 
list or appended at the end Bentley 1992 . Another similar heuristic is Sa vings |Clarke 
and Wright 1964 which is reported to work well with the Euclidean distance Johnson and 
McGeoch 2004| . A subclass of the tour-construction heuristics are the insertion heuristics 



2 Unless we explicitly include the number of columns c in the complexity analysis, we consider it to be a 
constant. 
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the selected tuple is inserted at the best possible location in the existing tour. They differ 
in how they pick the tuple to be inserted: 

— Nearest Insertion: we pick a tuple nearest to a tuple in the tour. 

— Farthest Insertion: we pick a tuple farthest from the tuples in the tour. 

— Random Insertion: we pick an available tuple at random. 

One might also pick a tuple whose cost of insertion is minimal, leading to an 0(n 2 log rt) 
heuristic. Both this approach and Nearest Insertio n are 2-optimal, but the na med in- 
sertion heuristic s are in 0(n 2 ) Rosenkrantz et al. 1977 . There are many variations Kahng 



and Reda 2004 

- Multiple Fragment (or Greedy) is a bottom-up heuristic: initially, ea ch tuple 
const itutes a fragment of a tour, and fragments of tours are repeatedly merged Bentley 
1992 . The distance between fragments is computed by comparing the first and last tuples 
of both fragments. Under the Hamming distance, there is a c + 1-pass implementation 
strategy: first merge fragments with Hamming distance zero, then merge fragments with 
Hamming distance one and so on. It runs in time 0(n 2 c 2 ). 

- Finally, the last class of heuristics are those beginning with an existing tour. We 
continue trying to improve the tour until it is no longer po ssible or another stopping criteria 
is met. There are many "tour- improvement techniques" Helsgaun 2000 Applega te et al. 



2003 . Several heuristics break the tour and attempt to reconstru ct a better one I Crocs 1958 
Lin and Kernighan 19731 |Helsgaun 20001 [Applegate et al. 2003 . 

Malik and Render [2007 propose the aHDU heuristic which permutes successive tuples to 



improve the solution. |P~inar et al. [2005] describe a similar scheme, where they consider 
permu ting tuples that are no t immediately adjacent, provided that they are not too far 



apart, 
a sing 



Pinar and Heath [1999] repeatedly remove and reinsert (henceforth 1- Reinsertion) 
e tuple at a better location. A variation is the BruteForcePeephole heuristic: 
divide up the table into small non-overlapping partitions of rows, and find the optimal 
solution that leaves the first and last row unchan ged (that is, we solve a Traveling Salesman 
Path Problem (TSPP) [Lam and Newman 20081). 



3.3. Scaling up the heuristics 

External-memory sorting is applicable to very large tables. However, even one of the fastest 
TSP heuristics (Nearest Neighbor) may fail to scale. We consider several strategies to 
alleviate this scalability problem. 

3.3.1. Sparse graph. Instead of trying to solve the problem over a dense grap h, where every 
tuple can follow any other tuple in the tour, we may construct a sparse graph [Reinelt 1994[ 



Johnson et al. 2004) . For example, the sparse graph might be constructed by limiting each 
tuple to some of its near neighbors. A similar ap proach has also been used, for e xample, 
in the design of heuristics in we ighted matching |Grigoriadis and Ralantari 1988 and for 
document identifier assignment 
neighbors. 

We consider a similar strategy. Instead of storing a sparse graph structure, we store 
the table in several different orders. We compare rows only against other rows appearing 



Ding et al. 2010] . In effect, we approximate the nearest 



sorted lists to be approximate near neighbors |Indyk and Motwani 1998 |Gionis et al. 1999 


Indyk et al. 1997 Chakrabarti et al. 1999 


Liu 2004; Kushilcvitz et al. 1998] . We implemented 



Before we formally describe the Multiple Lists heuristic, consider the example given 
in Fig. [I] Starting from an initial table (Fig. la), we sort the table lexicographically with 



the first column as the primary key: this forms a list which we represent as solid edges in 
the graph of Fig. [TcJ Then, we re-sort the table, this time using the second column as the 
primary key: this forms a second list which we represent as dotted edges in Fig. lc Finally, 
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(a) Initial table (b) Possible solution 




(c) Multiply-linked list 

Fig. 1: Row-reordering example with Multiple Lists 

starting from one particular row (say 1,3), we can greedily pick a nearest neighbor (say 3,3) 
within the newly created sparse graph. We repeat this process iteratively (3,3 goes to 5,3 
and so on) until we have the solution given in Fig. |lb| 

Hence, to apply Multiple Lists we pick several different ways to sort the table. For each 
table order, we store the result in a dynamic data structure so that rows can be selected 
in order and removed quickly. (Duplicate rows can be stored once if we keep track of their 
frequencies.) One implementation strategy uses a multiply- linked list. Let K be the number 
of different table orders. Add to each row room for 2K row pointers. First sort the row 
in the first order. With pointers, link the successive rows, as in a doubly-linked list — using 
2 pointers per row. Resort the rows in the second order. Link successive rows, using another 
2 pointers per row. Continue until all K orders have been processed and every row has 
2K pointers. Removing a row in this data structure requires the modification of up to 
AK pointers. 

For our experiments, we applied Multiple Lists with K = c as follows. First sort the 
table lexicographicalljrl after ordering the columns by non-decreasing cardinalities (Ni < 
N2 < ■ ■ ■ < N c ). Then rotate the columns cyclically so that the first column becomes the 
second one, the second one becomes the third one, and the last column becomes the first: 



3 Sorting with reflected Gray code yielded no appreciable improvement on Zipfian data. 
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1, 2, . . . , c — > c, 1,2, ... ,c — 1. Sort the table lexicographically according to this new column 
order. Repeat this process until you have c different orders, each one corresponding to the 
same table sorted lexicographically with a different column order. 

Once we have sorted the table several times, we proceed as in Nearest Neighbor (see 
Algorithm [T]): 

— Start with an initially empty list (3. 

— Pick a row at random, remove it from all sorted lists and add it to list f3. 

— In all sorted lists, visit all rows that are immediately adjacent (in sorted order) to the last 
row added. There are up to 2c such rows. Pick a nearest neighbor under the Hamming 
distance, remove it from all sorted lists and append it to /3. 

— Repeat the last step until the sorted lists are empty. The solution is then given by list j3, 
which now contains all rows. 

Multiple Lists runs in time 0(Kcn log n), or 0(c 2 n log n) when we use c sorted lists. 
However, we can do better in the K = c case with cyclically rotated column order. First, we 
sort with column order 1, 2, . . . , c in 0(cn log n) time. Then, we have N\ lists — one list per 
value in the first column — sorted in the c, 1, 2, . . . , c — 1-lexicographical order. Thus, sort- 
ing in the c, 1, 2, . . . , c — 1-lexicographical order requires only 0(cn log N±) time. Similarly, 
sorting in the c — 1, c, 1, 2, . . . , c — 2-lexicographical order requires 0(cn log N 2 ) time. And 
so on. Thus, the total sorting time is in 

0(cn \ogn + cnlogiVi H h cn\ogN c ) = 0{cn\og(nN 1 N 2 . .. JV c _i)). 

We expect that nN\N 2 ■ ■ ■ iV c _i -C n c and thus 

log(nAiA 2 . . . Nc-!) < clogn 

in practice. (This approach o f reusing previous sort ing orders when re-sorting is reminiscent 
of the Pipe Sort algorithm Agarwal et al. 1996 .) The overall complexity of Multiple 



Lists is thus 0(c 2 n + cn\og(nN 1 J\l 2 . . . A c _i)). 



Algorithm 1 The Multiple Lists heuristic 

1: input: Unsorted table t with n rows and c columns. 
2: output: a sorted table 

3: Form K different versions of t, sorted differently: v- 1 ' ,v- 2 ' , . . . 
4: /3 empty list 

5: pick an element in t^' randomly, add it to ft and remove it from all t^'s 

6: while size(/3) < n do 

7: let r be the latest element added to (3 

8: Given i £ {1, 2, . . . , K}, there are up to two neighbors in sorted order within list t^; out of 

up to 2K such neighbors, pick a nearest neighbor r' to r in Hamming distance. 
9: Add r' to /3 and remove it from all t^'s 
10: end while 
11: return f3 



Although an increase in K degrades the running time, we expect each new list stored to 
improve the solution's quality. In fact, the heuristic Multiple Lists becomes equivalent 
to Nearest Neighbor when we maintain the table in all of the c! lexicographical sorting 
orders. This shows the following result: 

Lemma 3.3. The Nearest Neighbor heuristic over c-column tables and under the 
Hamming distance is in 0(c cln log n). 
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When there are many columns (c > 4), constructing and maintaining c! lists might be 
prohibitive. Through informal tests, we found that maintaining c different sort orders is a 
good compromise. For c < 2, Multiple Lists with c sorted lists is already equivalent to 
Nearest Neighbor. 

Unfortunately, our implementation of Multiple Lists is not suited to an external- 
memory implementation without partitioning. 

3.3.2. P artitioning. Several authors partition — or cluster — the tuples before applying a TSP 



heuristic. |Cesari 1996| |Reinelt 1994] |Karp 1977] |Johnson et al. 2004| |Schaller 1999| Using 
database terminology, they partition the table horizontally. We explore this approach. The 
content of the horizontal partitions depends on the original order of the rows: the first rows 
are in the first partition and so on. Hence, we are effectively considering a tour-improvement 
technique: starting from an existing row ordering, we partition it and try to reduce the 
number of runs within each partition. For example, we can partition a lexicographically 
sorted table and process each partition in main memory using expensive heuristics such as 
Nearest Neighbor or random-access-intensive heuristics such as Multiple Lists. 

We can process each partition independently: the problem is embarrassingly parallel. Of 
course, this ignores runs created at the boundary between partitions. 

Sometimes, we know the final row of the previous partition. In such cases, we might 
choose the initial row in the current partition to have a small Hamming distance with the 
last row in the previous partition. In any case, this boundary effect becomes small as the 
sizes of the partitions grow. 

Another immediate practical benefit of the horizontal partitioning is that we have an 



anytime — or interruptible — algorithm Dean and Boddy 1988 . Indeed, we progressively 
improve the row ordering, but can also abort the process at any time without losing the 
gains achieved up to that point. 

We could ensure that the number of runs is always reduced. Indeed, whenever the ap- 
plication of the heuristic on the partition makes matter worse, we can revert back to the 
original tuple order. Similarly, we can try several heuristics on each partition and pick the 
best. And probabilistic heuristics such as Nearest Neighbor can be repeated. Moreover, 
we can repartition the table: e.g., each new partition — except the first and the last — can 



take half its tuples from each of the two adjacent old partitions (Johnson and McGeoch 



1997 

For simplicity, we can use partitions having a fixed number of rows (except maybe for 
the last partition). As an alternative, we could first sort the data and then create partitions 
based on the value of one or several columns. Thus, for example, we could ensure that all 
rows within a partition have the same value in one or several columns. 

4. MAXIMIZING THE NUMBER OF LONG RUNS 

It is convenient to model database compression by the number of runs (RunCount). How- 
ever, this model is clearly incomplete. For example, there is some overhead corresponding 
to each run of values we need to code: short runs are difficult to compress. 

4.1. Heuristics for long runs 

We developed a number of row-reordering heuristics whose goal was to produce long runs. 
Two straightforward app roaches did not give expe rimental results that justified their costs. 
One is due to an idea of|Malik and Render [2007] . Consider the Nearest Neighbor TSP 



heuristic. Because we use the Hamming distance, there are often several nearest neighbors 
for the last tuple added. Malik and Kender proposed a modification of Nearest Neighbor 
where they determine the best nearest neighbor based on comparisons with the previous 
tuples — and not only with the latest one. We considered many variations on this idea, and 
none of them proved consistently beneficial: e.g., when there are several nearest neighbors 
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(a) Initial table (b) (f{vi),i,Vi) (c) (f(vi),i,Vi) (sorted) (d) Sorted triples (e) Solution 



Fig. 2: Row-reordering example with Frequent-Component 



to the latest tuple, we tried reducing the set to a single tuple by removing the tuples that 
are not also nearest to the second last tuple, and then removing the tuples that are not also 
nearest to the third last tuple, and so on. 

A novel heuristic, Iterated Matching, was also developed but found to be too ex- 
pensive for the quality of results typically obtained. It is based on the observation that 
a weighted-matching algorithm can form pairs of rows that have many length-two runs. 
Pairs of rows can themselves be matched into collections of four rows with many length- 
four runs, etc. Unfortunately, known maximum-matching algorithms are expensive and the 
experimen tal results obtained with this heuristic were not promising. Details can be found 



elsewhere Lemire et al. 2012 



4.2. The Frequent-Component order 

Intuitively, we would like to sort rows so t hat frequent values are more likely to appear 



consecutively. The Frequent- Component Lemire et al. 2010 order follows this intuition. 

As a preliminary step, we compute the frequency f(v) of each column value v within each 
of the c columns. Given a tuple, we map each component to the triple (frequency, colum n in - 
dex, column value) rjThus, from the c components, we derive c triples (see Figs. 2a and 2b I: 



e.g., given the tuple ifvi, v 2 , v 3 ), we get the triples ((/ (ui), 1, ui), (f{v 2 ), 2, v 2 ), (f\v^) : 3, v^f) 
We then lexicographically sort the triples, so that the triple corresponding to a most- 
frequent column value appears first — that is, we sort in reverse lexicographical order (see 



Fig. 2c I: e.g., assuming that /(wa) < f(vi) < f{v2), the triples appear in sorted order as 
((f(v2)> 2, ^a), (f(vi), 1, i>i), (/(«3)j 3, W3)). The new tup les are then compared against each 
other lexicographically over the 3c values (see Fig. |2d[ ) . When sorting, we can precompute 
the ordered lists of triples for speed. As a last step, we reconstruct the solution from the 



list of triples (see Fig. 2e I 



Consider a table where columns have uniform histograms: given n rows and a column 
of cardinality iVj, each value appears n/Ni times. In such a case, Frequent-Component 
becomes equivalent to the lexicographic order with the columns organized in non-decreasing 
cardinality. 

4.3. Vortex: a novel order 

The Frequent-Component order has at least two inconveniences: 

4 This differs slightly from the original presentation of the order [Lemire et al. 2010| where the column value 
appeared before the column index. 
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(a) Lexicographic (b) Reflected GC (c) Vortex (d) z-order with (e) Hilbert 

Gray codes 

Fig. 3: Two-column tables sorted in various orders 



- Given c columns having N distinct values apiece, a table where all possible rows are 
present has N c distinct rows. In this instance, Frequent-Component is equivalent to the 
lexicographic order. Thus, its RunCount is N c + N c ~ 1 + ■ • - + Yet abetter solution would 
be to sort the rows in a Gray-code order, generating only N c + c— 1 runs. For mathematical 
elegance, we would rather have Gray-code orders even though the Gray-code property may 
not enhance compression in practice. 

- The Frequent-Component order requires comparisons between the frequencies of 
values that are in different columns. Hence, we must at least maintain one ordered list of 
values from all columns. We would prefer a simpler alternative with less overhead. 

Thus, to improve over Frequent-Component, we want an order that considers the 
frequencies of column values and yet is a Gray-code order. Unlike Frequent-Component, 
we would prefer that it compare frequencies only between values from the same column. 

Man y orders, such as z-orders with Gray codes jFaloutsos 1986| (see Fig. |3d|) and Hilbert 
orders (Hamilton and Rau-Chaplin 2008 Kamel and Faloutsos 1994; Eavis and Cueva 2007 



(see Fig. 3e), use some form of bit interleaving: when comparing two tuples, we begin by 



comparing the most significant bits of their values before considering the less significant 



bits. Our novel Vortex order interleaves individual column values instead (see Fig. 3c) 
Informally, we describe the order as follows: 

— Pick a most frequent value from the first column, select all tuples having the value 
x^- 1 ' as their first component, and put them first (see Fig. 5b with a;*- 1 ' = 1); 



Consider the second column. Pick a most frequent value y^K Among the tuples having 
as their first component, select all tuples having as their second component and 



put them last. Among the remaining tu 
component, and put them first (see Fig. 
-Repeat. 



ales, select all tuples having j/ 1 ) as their second 



5c 



with = 1); 



Our intuition is that, compared to bit interleaving, this form of interleaving is more likely 
to generate runs of identical values. The name Vortex comes from the fact that initially 



Reordering Rows for Better Compression: Beyond the Lexicographic Order 



:13 













1,1 


1,2 


1,3 


1,4 












, 2,1 


2,2 


2,3 


2,4 




, 3,1 




3,3 


3,4 




4,1 


4,2 


4,3 


4,4 



Fig. 4: Graphical representation of some of the order relations between tuples in {1, 2, 3, 4} x 
{1, 2, 3,4} under the Vortex order 
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(b) Value 1, first column (c) Value 1, second column 



Fig. 5: First two steps in sorting with Vortex 



there are long runs, followed by shorter and shorter runs (see Figs. 3c and|4j). 

To describe the Vortex order formally (see Algorithm pL wc introduce the alternating 
lexicographic order (henceforth alternating). Given two tuples a and b, let j be the first 
component where the two tuples differ (aj bj but a, = bi for i < j), then a < A lt b if 
and only if (oj < bj) © (j is even) where © is the exclusive or. (We use the convention 
that components are labeled from 1 to c so that the first component is odd.) Given a tuple 
x = (x±, 22, ■ ■ • j x c ), let T(x) be (xi, 1), (x2, 2), . . . , (x c , c) and T 1 ' (x) be the list T(x) sorted 
lexicographically. Then x <vortex 

y if and only if T'(x) < ALT T'(y). Vortex generates a 
total order on tuples because T' is bijective and alternating is a total order. 

We illustrate Vortex sorting in Fig. [6j First, the initial table is normalized by frequency 



so that the most frequent value in each column is mapped to value 1 (see Figs. 6a 6b 6c I 



In Fig. 6d we give the corresponding T' values: e.g., the row 1, 4 becomes (1, 2}^X%V). We 
then sort the T" values using the alternating order (see Fi g. [6e| ) before finally inverting 
the T' values to recover the rows in Vortex order (see Fig. |6f|). Of course, these are not 
the rows of the original table, but rather the rows of renormahzed table. We could further 
reverse the normalization to recover the initial table in Vortex order. 

Like the lexicographical order, the Vortex order is oblivious to the column cardinalities 
Ni, iV 2 , . . . , N c : we only use the content of the two tuples to determine which is smallest 
(see Algorithm^. 

Compared with Frequent-Component, Vortex always chooses the first (most fre- 
quent) value from column 1, then the most frequent value from column 2, and so forth. 
Indeed, we can easily show the following proposition using the formal definition of the 
Vortex order: 

Lemma 4.1. Consider tuples with positive integer values. For any 1 < k < c, suppose 
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(d) T" values (c) ALTERNATING order (f) Solution (normalized) 

Fig. 6: Row-reordering example with Vortex 



Algorithm 2 Comparison between two tuples of integers by the Vortex order. We recom- 
mend organizing the columns in non-decreasing order of cardinality and the column values 
in non-increasing order of frequency. 

1: input: two tuples x = (xi, X2, ■ ■ ■ , x c ) and y = (j/i, j/2, • ■ • , y c ) 
2: output: whether x < vortex y 
3: x' 4- (xi, 1), (£2,2), . . . , (x c ,c) 

4: sort the list x' lexicographically {E.g., (13, 1), (12, 2) -> (12, 2), (13, 1).} 
5: 2/'<- (yi,l),(j/2,2),...,(2/ c ,c) 
6: sort the list j/' lexicographically 
7: for i = 1, 2, . . . , c do 
8: if x'f 7^ ?/i then 

9: return (x- < LE xico J/0 © (i is even) 
10: end if 
11: end for 

12: return false {We have x = y.} 



t is a tuple containing the value 1 in one of its first k components, and r' is a tuple that 
does not contain the value 1 in any of its first k components. Then r < 

VORTEX T • 

Frequent-Component instead chooses the most frequent value overall, then the 
second-most frequent value overall, and so forth, regardless of which column contains them. 
Both Frequent-Component and Vortex list all tuples containing the first value consec- 
utively. However, whereas Vortex also lists the tuples containing the second value consec- 
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utively, Frequent-Component may fail to do so. Thus, we expect Vortex to produce 
fewer runs among the most frequent values, compared to Frequent- Component. 

Whereas the Hilbert order is only a Gray code when N is a power of two, we want 
to show that Vortex is always an V-ary Gray code. That is, sorting all of the tuples 
in {1,2,..., N} x ■ ■ ■ x {1,2,..., N} = {1,2,..., N} c creates a Gray code — the Hamming 
distance of successive tuples is one. We believe that of all possible orders, we might as well 
pick those with the Gray-code property, everything else being equal. 

Let VORTEX(iV 1 , N 2 , • • • , N c ) be the l\ c l=1 N, tuples in {1,2,..., VJ x • • • x {1, 2, . . . , N c } 
sorted in the Vortex order. Let 

A^ fe = Vortex(V, V, . ..,N , N-1,N-1,...,N-1) . 

c—k k 

We begin by the following technical lemmata which allow us to prove that Vortex is a 
Gray code by induction. 

Lemma 4.2. If A^_ lk _ 1 and A^ k are Gray codes, then so is A^ k _ 1 for any integers 
N > 1, c> 2, k e {l,...',c}. 

Proof. Assume that k and A^ fe are Gray codes. The A^ k _ 1 tuples begin with the 
tuple 

(1, V, . . . , N, N - 1, V - 1, . . . , N - 1) 
s v ' s v ' 

c—k k—1 

and they continue up to the tuple 

(l,l,N,...,N,N-l,N-l,...,N-l). 

s v ' s v ' 

c-fe-1 k-1 

Except for the first column which has a fixed value (one), these tuples are in reverse A^_j fe _ 1 
order, so they form a Gray code. The next tuple in A^ r fe _ 1 is 

(N,1,N,...,N,N-1,N-1,...,N-1) 

s v ' s v ' 

c-fe-1 fe-1 

and the following tuples arc in an order equivalent to A^ fc , except that we must consider 
the first column as the last and decrement its values by one, while the second column is 
considered the first, the third column the second, and so on. The proof concludes. □ 

Lemma 4.3. For all c > 1 and all k e {0, 1, . . . , c}, A? ck is a Gray code. 

PROOF. We have that A^ is a reflected Gray code with column values 1 and 2: e.g., for 
c = 3, we have the order 

1,2,2 
1,2,1 
1,1,1 

A 2 _ 1,1,2 
A 3,o - 2,1,2- 

2,1,1 

2,2,1 

2,2,2 

Thus, we have that A^ is always a Gray code. Any column with cardinality one can be 
trivially discarded. Hence, we have that A? c k is always a Gray code for all k £ {0, 1, . . . , c}, 
proving the lemma. □ 
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Lemma 4.3 



Fig. 7: Illustration of the basis for the induction argument in the proof of Proposition 4.5 



Lemma 4.4. For any value of N and for k = or k = 1, A^ k is a Gray code 



Proof. We have that A^ is a Gray code for any value of N because one-component 
tuples are always Gray codes. It immediately follows that is also a Gray code for any 
N since, by definition, Af^ = A^jj" 1 . □ 

Proposition 4.5. The Vortex order is an N-ary Gray code. 

Proof. The proof uses a multiple induction argument, which we express as pseudocode. 
At the end of each pass through the loop on N, we have that A^ k is a Gray code for all 
N' < N, all c £ {2, . . . ,C} and all k £ {0, 1,2,..., c}. The inducti on b egins from the cases 
where there ar e on ly two values per column (N — 2, see Lemma 4.3 1 or only one column 
(c = 1, Lemma 4.4). See Fig. [7] for an illustration. 



1: for N = 3,4,... ,JV do 

2: We have that A^T is a Gray code for all c £ {2,3, ...,C} and also for all 
k £ {0, 1,2, ... ,c}. {For N > 3, this is t rue f rom the previous pass in the loop 
on N. For N = 3, it follows from Lemma |4.3| } 
for c = 2, . . . ,C do 

By linejij A^ 1 is a Gray code, and by definition A^ c = A^ 1 . 
for k = c, c — 1, . . . , 1 do 

(1) A^ fc is a Gray code {when k = c it follows by line |4] and otherwise by 
line [8] from the previous pass of the loop on i} 
7: (2) A c _ 1 k _ 1 is a Gray code {when c = 2, it follows by Lemma 

c > 2 it follows from the previous pass of the l oop on c} 
8: (1) + (2) => A^ r k _ 1 is a Gray code by Len 

9: end for 

10: Hence, A^ is a Gray code. (And also A^ k for all k £ {0, 1, 2, 
11: end for 
12: end for 



4.4 



4.2 



and for 



!}•) 



The integers Af and C can be arbitrarily large. Thus, the pseudocode shows that A^ k is 
always a Gray code which proves that Vortex is an A^-ary Gray code for any number of 
columns and any value of N. □ 

5. EXPERIMENTS: TSP AND SYNTHETIC DATA 

We present two groups of experiments. In this section, we use synthetic data to compare 
many heuristics for minimizing RunCount. Because the minimization of RunCount 
reduces to the TSP over the Hamming distance, we effectively assess TSP heuristics. 
Two heuristics stand out, and then in § [6] these two heuristics are assessed in more re- 
alistic settings with actual database compression techniques and large tables. The Java 
source code necessary to reproduce our experiments is at |http : / / co de . google . com/p/ 
rowreorderingjavalibrary/. 
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Table II: Relative reduction in RunCount compared to lexicographical sort (set at 1.0), 
for Zipfian tables (c = 4) 





relative RunCount reduction 
n=8192 n =131 072 n = 1048 576 


Lexicographic Sort 


1.000 


1.000 


1.00 


Multiple Lists 


1.167 


1.188 


1.204 


Vortex 


1.154 


1.186 


1.203 


Frequent-Component 


1.151 


1.186 


1.203 


Nearest Neighbor 


1.223 


1.242 




Savings 


1.225 


1.243 




Multiple Fragment 


1.219 


1.232 




Farthest Insertion 


1.187 






Nearest Insertion 


1.214 






Random Insertion 


1.201 






Lexicographic Sort+1-reinsertion 


1.171 






VORTEX+ 1-REINSERTION 


1.193 






Frequent-Component+1-reinsertion 


1.191 







The aHDO scheme runs a complete pass through the table, trying to permute successive 
rows. If a pair of rows is permuted, we run through another pass over the entire table. We 
repeat until no improvement is possible. In contrast, the more expensive tour-improvement 
heuristics (BruteForcePeephole and 1-REINSERTIOn) do a single pass through the table. 
The BruteForcePeephole is applied on successive blocks of 8 rows. 



5.1. Reducing the number of runs on Zipfian tables 



Zipfian dist ributions are commonly used Eavis and Cueva 2007 Houkjasr et al. 2006 Gray 
et al. 1994] to model value distributions in databases: within a column, the frequency of 



the i bh value is proportional to If a table has n rows, we allow each column to have 
n possible distinct values, not all of which will usually appear. We generated tables with 
8 192-1 048 576 rows, using four Zipfian-distributed columns that were generated indepen- 
dently. Applying Lemma |3.1| to these tables, the lexicographical order is w-optimal for 
lu s» 3. 

The RunCount results are presented in Table [TT] We present relative RunCount reduc- 
tion values: a value of 1.2 means that the heuristic is 20% better than lexicographic sort. 
For some less sca lable heuristics, we only give results for small tables. Moreover, we omit 



results for aHDO Malik and Render 2007 and BruteForcePeephole (with partitions of 
eight rows) because these tour-improvement heuristics failed to improve any tour by more 
than 1%. Because BruteForcePeephole fails, we conclude that all heuristics we consider 
are "locally almost optimal" because you cannot improve them appreciably by optimizing 
subsets of 8 consecutive rows. 

Both Frequent-Component and Vortex are better than the lexicographic order. 
The Multiple Lists heuristic is even better but slower. Other heuristics such as Nearest 
Neighbor, Savings, Multiple Fragment and the insertion heuristics are even better, 
but with worse running time complexity. For all heuristics, the run-reduction efficiency 
grows with the number of rows. 

5.2. Reducing the number of runs on uniformly distributed tables 

We also ran experiments using uniformly distributed tables, generating n-row tables where 
each column has n possible values. Any value within the table can take one of n distinct 
values with probability l/nQFor these tables, the lexicographical order is 3.6-optimal ac- 



5 We abuse the terminology slightly by referring to these tables as uniformly distributed: the models used 
to generate the tables are uniformly distributed, but the data of the generated tables may not have uniform 
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Table III: Relative reduction in RunCount compared to lexicographical sort (set at 1.0), 
for uniformly distributed tables (c = 4) 





relative RunCount reduction 
n=8192 n =131 072 n = 1048 576 


Lexicographic Sort 


1.000 


1.000 


1.00 


Multiple Lists 


1.127 


1.128 


1.128 


Vortex 


1.020 


1.020 


1.021 


Frequent-Component 


1.022 


1.023 


1.023 


Nearest Neighbor 


1.122 


1.123 




Savings 


1.122 


1.123 




Multiple Fragment 


1.133 


1.133 




Farthest Insertion 


1.075 






Nearest Insertion 


1.129 






Random Insertion 


1.103 






Lexicographic Sort+1-reinsertion 


1.092 






Vortex+1-reinsertion 


1.094 






Frequent-Component+1-reinsertion 


1.080 







cording to Lemma 3.1 



Table |III| summarizes the results. The most striking difference with Zipfian tables is 
that the efficiency of most heuristics drops drastically. Whereas Vortex and Frequent- 
Component are 20% superior to the lexicographical order on Zipfian data, they are barely 
better (2%) on uniformly distributed data. Moreover, we fail to see improved gains as the 
tables grow larger — unlike the Zipfian case. Intuitively, a uniform model implies that there 
are fewer opportunities to create long runs of identical values in several columns, when 
compared to a Zipfian model. This probably explains the poor performance of Vortex and 
Frequent-Component. 

We were surprised by how well Multiple Lists performed on uniformly distributed 
data, even for small n. It fared better than most heuristics, including Nearest Neighbor, 
by beating lexicographic sort by 13%. Since Multiple Lists is a variant of the greedy 
Nearest Neighbor that considers a subset of the possible neighbors, we see that more 
choice does not necessarily give better results. Multiple Lists is a good choice for this 
problem. 

5.3. Discussion 

For these in-memory synthetic data sets, we find that the good heuristics to minimize 
RunCount — that is, to solve the TSP under the Hamming distance — are Vortex and 
Multiple Lists. They are both reasonably scalable in the number of rows (0(n\ogn)) 
and they perform well as TSP heuristics. 

Frequent-Component would be another worthy alternative, but it is harder to imple- 
ment as efficiently as Vortex. Similarly, we found that Savings and Multiple Fragment 
could be superior TSP heuristics for Zipfian and uniformly distributed tables, but they scale 
poorly with respect to the number of rows: they have a quadratic running time (0(n 2 )). 
For small tables (n = 131072), they were three and four orders of magnitude slower than 
Multiple Lists in our tests. 



6. EXPERIMENTS WITH REALISTIC TABLES 

Minimizing RunCount on synthetic data might have theoretical appeal and be some- 
what applicable to many applications. However, we also want to determine whether row- 
reordering heuristics can specifically improve table compression, and therefore database 
performance, on realistic examples. This requires that we move from general models such 



histograms. 
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as RunCount to size measurements based on specific database-compression techniques. It 
also requires large real data sets. 

6.1. Realistic column storage 



We use conven tional dictionary coding [Lemke et al. 20i"fj| jBinnig et al. 2 
Potapov 2003 prior to row-reordering and compression. That is, we map 



2009 
col 



Poessandj 
umn values 



bijectively to 32-bit integers in [0, N) where N is the nu mber of distinct column valuesj^] 



We map the most frequent values to the smallest integers Lemire and Kaser 2011 

We compress tables using five database compression schemes: Sparse, Indirect and Prefix 
coding as well as a fast variation on Lempel-Ziv and RLE. We compare the compression 
ratio of each technique under row reordering. For simplicity, we select only one compression 
scheme at a time: all columns are compressed using the same technique. 



6.1.1. Block-wise compression. The SAP Net Weaver platform |Lemke et al. 2010 uses three 
compression techniques: Indirect, Sparse and Prefix coding. We implemented them and set 
the block size to p — 128 values. 

- Indirect coding is a block-aware dictionary technique. For each column and each block, 
we build a list of the N' values encountered. Typically, N' <C N, where N is the total number 
of distinct values in the column. Column values are then mapped to integers in [0, N') and 
packed using [log TV] bits each Ng and Ravishankar 1997 Binnig et al. 2009| . Of course, 
we must also store the actual values of the A' codes used for each block and column. Thus, 
whereas dictionary coding requires p[log Af| bits to store a column, Indirect coding requires 
AT' [log N~\ + p[log A'] bits — plus the small overhead of storing the value of N' . In the worst 
case, the storage cost of indirect coding is twice as large as the storage cost of conventional 
dictionary coding. However, whenever N' is small enough, indirect c oding is preferable. 
Indirect coding is related to the block-wise value coding used by Oracle IPoess and Potapov 



2003 



Sparse coding stores the most frequent value using an p-bit bitmap to indicate where 
this most frequent value appears. Other values are stored using [log AT] bits. If the most 
frequent value appears ( times, then the total storage cost is (p — ( + l)[log A] + p bits. 

- Prefix coding begins by counting how many times the first value repeats at the be- 
ginning of the block; this counter is stored first along with the value being repeated. Then 
all other values are packed. In the worst case, the first value is not repeated, and Prefix 
coding wastes [logp] bits compared to conventional dictionary codin g. Bec ause it counts 



the length of a run, Prefix coding can be considered a form of RLE (§ 6.1.3) 



6.1.2. Lempel-Ziv-Oberhumer. Lempel-Ziv compression Ziv and Lempel 1978 compresses 
data by replacing sequences of characters by references to previ ously encountered sequences. 
The Lempel-Ziv-Oberhumer (LZO) library Oberhumer 2011] implemen ts fast versions of 



the conventional Lempel-Ziv compression technique. Abadi et al. [2006] evaluated several 
alternatives including Huffman and Arithmetic encoding, but found that compared with 
LZO, they were all too expensive for databases. 

If a long data array is made of repeated characters — e.g., aaaaa — or repeated short 
sequences — e.g., ababab — we expect most Lempel-Ziv compression techniques to generate 
a compressed output that grows logarithmically with the size of the input. It appears to 
be the case with the LZO library: its LZOIX CODEC uses 16 bytes to code 64 identical 
32-bit integers, 17 bytes to code 128 identical integers, and 19 bytes for 256 integers. For 
our tests, we used LZO version 2.05. 



6.1.3. Run-Length Encoding. We implemented RLE by storing each run as a triple |Stone- 
braker et al. 2005 : value, starting point and length of the run. Values are packed using 



6 We do not store actual column values (such as strings): their compression is outside our scope. 
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Table IV: Characteristics of data sets used with bounds on the optimality of lexicographical 
order (w), and a measure of statistical deviation (po)- 







rows 


distinct rows 


cols 






Po 


Censusl881 


4 


277 


807 


4 


262 


238 


7 


343 


422 


2.9 


0.17 


Census-Income 




199 


523 




196 


294 


42 


103 


419 


12 


0.65 


Wikileaks 


1 


178 


559 


1 


147 


059 


4 


10 


513 


1.3 


0.04 


SSB (DBGEN) 


240 


012 


290 


240 


012 


290 


15 


8 874 


195 


8.0 


0.10 


Weather 


124 


164 


607 


124 


164 


371 


19 


52 


369 


4.5 


0.36 


USCensus2000 


37 


019 


068 


22 


493 


432 


10 


2 774 


239 


3.4 


0.54 



[log AT] bits whereas both the starting point and the length of the run are stored using 
[log n\ bits, where n is the number of rows. 

6.2. Data sets 



We selected realistic data sets (see Table IV ): Census l881 Lemire and Kaser 2011 Pro- 
gramme de recherche en demographie hi storiquc 2009 , Census-Income |Frank and Asun- 



cion 2010|, a W ikilcaks-related data set, the fact table from the Star Schema Benchmark 
(SSB) [CrNeil et al. 20~09] and Weather [Hahn et al. 2004] . Censusl881 comes from the 



Canadian census of 1881: it is over 305 MB and over 4 million records. Census-Income is the 
smallest data set with 100 MB and 199 523 records. However, it has 42 columns and one col- 
umn has a very high relative cardinality (99 800 distinct values). The Wikileaks table was 
created from a public repository published by Googl^] and it contains the non-classified 
metadata related to leaked diplomatic cables. We extracted 4 columns: year, time, place 
and descriptive code. It has 1 178 559 records. Wegenerated the SSB fact table using a ver- 
sion of the DBGEN software modified by O'Neiljj We used a scale factor of 40 to generate 
it: that is, we used command dbgen -s 40 -T 1. It is 20 GB and includes 240 million rows. 
The largest non-synthetic data set (Weather) is 9 GB. It consists of 124 million surface 
synoptic weather reports from land stations for the 10-year period from Dec ember 1981 
through Nove mber 1991. We also extracted a table from the US Census of 2000 IUS Census 



Bureau 2002] (henceforth USCensus2000). We used attributes 5 to 15 from summary file 3. 
The resulting table has 37 million rows. 

The column cardinalities for Censusl881 range from 138 to 152 882 and from 2 to 
99 800 for Census-Income. Our Wikileaks table has column cardinalities 273, 1440, 3 935 
and 4865. The SSB table has column cardinalities ranging from 1 to 6 084 386. (The fact 
table has a column with a single value in it: zero.) For Weather, the column cardinalities 
range from 2 to 28 767. Attribute cardinalities for USCensus2000 vary between 130 001 to 
534 896. 

For each data set, we give the suboptimality factor to from Lemma |3.1| in Table |IV| 
Wikileaks has the lowest factor (u> = 1.3) followed by Censusl881 (u> = 2.9) whereas 
Census-Income has the largest one (oj = 12.4). Correspondingly, Wikileaks and Censusl881 
have the fewest columns (4 and 7), and Census-Income has the most (42). 

We also provide a simple measure of the statistical dispersion of the frequency of the 
values. For column i, we find a most frequent value Vi, and we determine what fraction of this 
column's values are Vi. Averaging the fractions, we have our measure po = Yl^—i f(vi)/nc, 
where our table has n rows and c columns and f{v{) is the number of times Vi occurs in its 

Ta] In the first column, the value '6' is a 
es. In the second column, the value '3' is 
most frequent, and it appears 4 times. In this example, we havepo = w^jt ~ 0.27. In general, 



column. As an example, consider the table in Fig 
most frequent value and it appears twice in 11 tup 



'http : //www . google . com/f usiontables/DataSource?dsrcid=224453 
-http : //www . cs .limb . edu/-poneil/publist .html 
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1 32 1024 32768 1048576 1 32 1024 32768 1048576 



partition size (rows) partition size (rows) 

(a) Running time (b) Compressed data size 

Fig. 8: The Multiple Lists heuristic applied on each partition of a lexicographically 
sorted Weather table for various partition sizes. When partitions have size 1, we recover the 
lexicographical order. We refer to the case where the partition size is set to 131 072 rows as 
Multiple Lists* (indicated by a dashed vertical line in the plots). 



we have that pn ranges between and 1. For uniformly distributed tables having high 
cardinality columns, the fraction p is near zero. When there is high statistical dispersion 
of the frequencies, we expect that the most frequent values have high frequencies and so 
Po could be close to 1. One benefit of this measure is that it can be computed efficiently 
By this measure, we have the highest statistical dispersion in Census-Income, Weather and 
USCensus2000. 

6.3. Implementation 

Because we must be able to process large files in a reasonable time, we selected Vortex 
as one promising row-reordering heuristic. We implemented Vortex in memory-conscious 
manner: intermediate tuples required for a comparison between rows are repeatedly built 
on-the-fly. 

Prio r to sorting lexicographi cally, we reorder columns in order of non-decreasing cardi- 



nality Lemire and Kaser 2011 : in all cases this was preferable to ordering the columns in 
decreasing cardinality. For Vortex, in all cases, there was no significant difference (< 1%) 
between ordering the columns in increasing or decreasing order of cardinality. Effectively, 
Vorte x does not favor a particular column order. (A related property can be formally 



proved [Lemire et al. 2012 .) 



We used Multiple Lists on partitions of a lexicographically sorted table (sec § |3.3[ )p| 
Fig. [8] shows the effect of the partition size on both the running time and the data com- 
pression. Though larger partitions may improve compression, they also require more time. 
As a default, we chose partitions of 131 072 rows. Henceforth, we refer to this heuristic as 
Multiple Lists*. 

We compiled o ur C++ software under GNU GCC 4.5.2 using the -03 fla g. The C++ 
source code is at http://code.google.eom/p/rowreorderingcpplibrary/. We ran our 
experiments on an Intel Core i7 2600 processor with 16 GB of RAM using Linux. All data 
was stored on disk, before and after the compression. We used a 1 TB SATA hard drive 
with an estimated sustained reading speed of 115 MB/s. We us ed an external-memory sort 



that we implemented in the conventional manner Knuth 1997 : we load partitions of rows 



9 When reporting the running time, we include the time required to sort the table lexicographically. 
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Table V: Compression ratio compared to lexicographical sort 





Vortex 


Multiple Lists* 


Vortex 


Multiple Lists* 


Sparse 


1.27 


1.00 


1.00 


0.99 


Indirect 


1.07 


1.02 


1.03 


0.96 


Prefix 


1.45 


0.99 


1.15 


0.95 


LZO 


0.99 


1.07 


0.97 


1.04 




1.02 


1.13 


0.97 


1.24 


RunCount 


0.99 


1.13 


0.96 


1.25 




(a) Census 


1881 


(b) Census-Income 




Vortex 


Multiple Lists* 


Vortex 


Multiple Lists* 


Sparse 


1.15 


0.94 


1.00 


0.99 


Indirect 


0.63 


0.96 


1.00 


0.99 


Prefix 


1.21 


0.91 


1.00 


0.97 


LZO 


0.63 


0.90 


1.00 


1.00 


1 1 \ A < 


0.71 


1.14 


1.00 


1.00 


RunCount 


0.70 


1.14 


1.00 


1.00 




(c) Wikilcaks 


(d) SSB (DBGEN) 




Vortex 


Multiple Lists* 


Vortex 


Multiple Lists* 


Sparse 


0.80 


1.06 


1.09 


1.00 


Indirect 


0.78 


1.55 


1.72 


1.06 


Prefix 


0.74 


0.94 


1.81 


1.08 


LZO 


0.91 


1.96 


0.26 


0.94 


RLE 


0.67 


1.69 


3.08 


1.15 


RunCount 


0.66 


1.67 


3.04 


1.15 




(e) Weather 


(f) USCensus2000 



which we sort in RAM and write back in the file; we then merge these sorted partitions 
using a priority queue. Our code is sequential. 

Our algorithms are scalable. Indeed, both Vortex and lexicographical sorting rely on 
the standard external-memory sorting algorithm. The only difference between Vortex and 
lexicographical sort is that the function used to compare tuples is different. This difference 
does not affect scalability with respect to the number of tuples even though it makes Vortex 
slower. As for Multiple Lists*, it relies on external- memory sorting and the repeated 
application of Multiple Lists on fixed-sized blocks — both of which are scalable. 



6.4. Experimental results on realistic data sets 

We present the results in Table [V] giving the compression ratio on top of the lexicographical 
order: e.g., a value of two indicates that the compression ratio is twice as large as what we 
get with the lexicographical order. 

Neither Vortex nor Multiple Lists* was able to improve the compression ratio on the 
SSB data set. In fact, there was no change (within 1%) when replacing the lexicographical 
order with Vortex. And Multiple Lists* made things slightly worse (by 3%) for Prefix 
coding, but left other compression ratios unaffected (within 1%). To interpret this result, 
consider that, while widely used, the DBGEN tool still generates synthetic data. For ex- 
ample, out of 17 columns, seven ha ve almost perfectly unifor m histograms. Yet "real world 



data is not uniformly distributed" Poess and Potapov 2003 

For Censusl881, the most remarkable result is that Vortex was able to improve 
the compression under Sparse or Prefix coding by 27% and 45%. For Census-Income, 
Multiple Lists* was able to improve RLE compression by 25%. For Wikileaks, we 
found it interesting that Multiple Lists* reduced RunCount (and the RLE output) 
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Table VI: Time necessary to reorder the rows 





Lexico. 


Vortex 


Multiple Lists* 


Censusl881 


2.7s 


14 s 


19 s 


Census-Income 


0.19s 


4.7s 


lis 


Wikileaks 


0.3 s 


2.0s 


2.5s 


SSB 


12 min 


35 min 


52 min 


SSB x2.5 


49 min 


105 min 


154 min 


Weather 


6 min 


26 min 


43 min 


USCensus2000 


33 s 


3 min 


12 min 



by 14% whereas the lexicographical order is already 1.26-optimal, which means that 
Multiple Lists* is 1.1-optimal in this case. On Weather, the performance of Vortex 
was disappointing: it worsened the compression in all cases. However, Multiple Lists* 
had excellent results: it doubled the Lempel-Ziv (LZO) compression, and it improved 
RLE compression by 70%. Yet on the USCensus2000 data set, Vortex was preferable to 
Multiple Lists* for all compression schemes except LZO. We know that the lexicographi- 
cal order is 3.4-optimal at reducing RunCount, yet VORTEX is able to reduce RunCount 
by a factor of 3 compared to the lexicographical order. It follows that Vortex is l.I-optimal 
in this case. 

Overall, both Vortex and Multiple Lists* can significantly improve over the lexi- 
cographic order when a database-compression technique is used on real data. For every 
database-compression technique, significant improvement could be obtained by at least one 
of the reordering heuristics on the real data sets. However, significant degradation could 
also be observed, and lexicographic order was best in four realistic cases (Sparse coding 
on Census-Income, Indirect coding on Wikileaks, Prefix coding on Weather and LZO on 



USCcnsus2000). In § 6.5 wc propose to determine, based on characteristics of the date 



whether significant gams are possible on top of the lexicographical order. 

6.4.1. Our row-reordering heuristics are scalable. We present wall-clock timings in Table |VI| 
to confirm our claims of scalability. For this test, we included a variation of the SSB 
where we used a scale factor of 100 instead of 40 when generating the data. That is, it 
is 2.5 times larger (henceforth SSB x2.5). As expected, the lexicographical order is fastest, 
whereas Multiple Lists* is slower than either Vortex or the lexicographical order. On 
the largest data set (SSB), Vortex and Multiple Lists* were 3 and 4 times slower than 
lexicographical sorting. One of the benefits of an approach based on partitions, such as 
Multiple Lists*, is that one might stop early if benefits are not apparent after a few 
partitions. When comparing SSB and SSB x2.5, we see that the running time grew by a 
factor of 4 for the lexicographical order, a factor of 2 for Vortex and a factor of 3 for 
Multiple Lists*. For SSB x2.5, the running time of Multiple Lists* included 50 min 
for sorting the table lexicographically, and the application of Multiple Lists on blocks of 
rows only took 104 min. Because Multiple Lists* uses blocks with a fixed size, its running 
time will be eventually dominated by the time required to sort the table lexicographically 
as we increase the number of tuples. 

6.4.2. Better compression improves speed. Everything else being equal, if less data needs to 
be loaded from RAM and disk to the CPU, speed is improved. It remains to assess whether 
improved compression can translate into better speed in practice. Thus, we evaluated how 
fast we could uncompress all of the columns back into the original 32-bit dictionary values. 
Our test was to retrieve the data from disk (with buffering) and store it back on disk. We 
report the ratio of the decompression time with lexicographical sorting over the decompres- 
sion time with alternative row reordering methods. Because the time required to write the 
decompressed values would have been unaffected by the compression, we would not expect 
speed gains exceeding 50% with better compression in this kind of test. 
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— First, we look at our good compression results on the Weather data set with 
Multiple Lists* for the LZO and RLE (resp. 1.96 and 1.69 compression gain). The 
decompression speed was improved respectively by a factor of 1.19 and 1.14. 

— Second, we consider the USCensus2000 table, where Vortex improved both Prefix cod- 
ing and RLE compression (resp. 1.81 and 3.04 compression gain). We saw gains to the 
decompression speed of 1.04 and 1.12. 

These speed gains were on top of the the gains already achieved by lexicographical sorting. 
For example, Prefix coding was only improved by 4% compared to the lexicographical order 
on the USCensus2000 table, but if we compute ratios with respect to a shuffled table, 
they went from 1.25 for lexicographical sorting to 1.30 with Vortex. Hence, the total 
performance gain due to row reordering is 30%. 

6.5. Guidance on selecting the row-reordering heuristic 

It is difficult to determine which row-reordering heuristic is best given a table and a compres- 
sion scheme. Our processing techniques are already fast, and useful guidance would need 
to be obtained faster — probably limiting us to decisions using summaries such as those 
maintained by the DBMS. And such concise summaries might be insufficient: 

- Suppose that we are given a set of columns and complete knowledge of their his- 
tograms. That is, we have the list of attribute values and their frequencies. Unfortunately, 
even given all this data, we could not predict the efficiency of the row reordering techniques 
reliably. Indeed, consider the USCensus2000 data set. According to Table [Vj Vortex im- 
proves RLE compression by a factor of 3 over the lexicographical order. Consider what 
happened when we took the same table (USCensus2000) and randomly shuffled columns, 
independently. The column histograms were not changed — only the relationships between 
columns were affected. Yet, not only did Vortex fail to improve RLE compression over this 
newly generated table, it made it much worse (from a ratio of 3.04 to 0.74). The performance 
of Multiple Lists* was also adversely affected: while it slightly improves the compression 
by Prefix coding (1.08) over the original USCensus2000 table, it made compression worse 
(0.93) over the reshuffled USCensus2000 table. 

- Perhaps one might hope to predict the efficiency of row-reordering techniques by us- 
ing small samples, without ever sorting the entire table. There are reasons again to be 
pessimistic. We took a random sample of 65 536 tuples from the USCensus2000 table. Over 
such a sample, Vortex improved LZO compression by 2.5% compared to the lexicograph- 
ical order, whereas over the whole data set Vortex makes LZO much worse than the 
lexicographical order (1.025 versus 0.26). Similarly, whereas Vortex improves RLE by a 
factor of 3 when applied over the whole table, the gain was far more modest over our sample 
(1.06 versus 3.04). 

However, we can offer some guidance. For compression schemes that are closely related to 
RunCount, such as RLE, the optimality of a lexicographic sort should be computed using 
Lemma |3.1| If w« 1, we can safely conclude that the lexicographical order is sufficient. 



Moreover, our results on synthetic data sets (§pl) suggest that some statistical dispersion 
in the frequencies of the values is necessary. Indeed, we could not improve the RunCount 

of tables having uniformly distributed columns even when uj were relatively large. On our 
real data sets, we got the best compression gains compared to the lexicographical order with 
the Weather and USCensus2000 tables. They both have high p values (0.36 and 0.54). 

Hence, we propose to only try better row-reordering heuristics when uj and po are large 
(e.g., uj > 3 and po > 0.3). Both measures can be computed efficiently. 

Furthermore, when applying a scheme such as Multiple Lists on partitions of the sorted 
table, it would be reasonable to stop the heuristic after a few partitions if there is no 
benefit. For example, consider the Weather data and Multiple Lists*. After 20 blocks of 
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131 072 tuples, we have a promising gain of 1.6 for LZO and RLE, but a disappointing ratio 
of 0.96 for Prefix coding. That is, we have valid estimates (within 10%) of the actual gain 
over the whole data set after processing only 2% of the table. 

7. CONCLUSION 

For the TSP under the Hamming distance, lexicographical sort is an effective and natu- 
ral heuristic. It appears to be easier to surpass the lexicographical sort when the column 
histograms have high statistical dispersion (e.g. Zipfian distributions). 

Our original question was whether engineers willing to spend extra time reordering rows 
could improve the compressibility of their tables, at least by a modest amount. Our answer 
is positive. 

— Over real data, Multiple Lists* always improved RLE compression when compared to 
the lexicographical order (10% to 70% better). 

— Vortex almost always improved Prefix coding compression, sometimes by a large per- 
centage (80%) compared to the lexicographical order. 

— On one data set, Vortex improved RLE compression by a factor of 3 compared to lexi- 
cographical order. 

As far as heuristics are concerned, we have certainly not exhausted the possibilities. 



Several tour-improvement heuristics used to solve the TSP |Johnson and McGeoch 1997 
could be adapted for row reordering. Maybe more importantly, we could adapt the TSP 
heuristics using a different distance measure than the Hamming distance. For example 
consider difference coding iMoffat and Stuiver 2000 Bhattacharjee et al. 2009 Anh and 



Moffat 2010 where the successive differences between attribute values are coded. In this 



case, we could use an inter-row distance that measures the number of bits required to code 
the differences. Just as importantly, the implementations of our row-reordering heuristics 
are sequential: parallel versions could be faster, especially on multicore processors. 
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